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Abstract 

A number of recent studies have estimated the inter-galactic void probability function and investigated 
its departure from various random models. We study a family of parametric statistical models based on 
gamma distributions, which do give realistic descriptions for other stochastic porous media. Gamma distri- 
butions contain as a special case the exponential distributions, which correspond to the 'random' void size 
probability arising from Poisson processes. The space of parameters is a surface with a natural Riemanniari 
metric structure. This surface contains the Poisson processes as an isometric embedding and a recent the- 
orem pQ shows that it contains neighbourhoods of all departures from randomness. The method provides 
thereby a geometric setting for quantifying departures from randomness and on which may be formulated 
cosmological evolutionary dynamics for galactic clustering and for the concomitant development of the void 
size distribution. 

The 2dFGRS data E] offer the possibility of more detailed investigation of this approach than was 
possible when it was originally suggested 01SJ[5| and some parameter estimations are given. 
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1 Introduction 

Several years ago, the author presented an information geometric approach to modelling a space of perturbations 
of the random state for galactic clustering and cosmological void statistics [7j. The present note is intended to 
update somewhat and draw attention to this approach as a possible contribution to the interpretation of the 
data from the 2-degree field Galaxy Redshift Survey (2dFGRS), cf Croton et al. |SJ|H]. 

The classical random model is that arising from a Poisson process of mean density N galaxies per unit volume 
in a large box. Then, in a region of volume V, the probability of finding exactly m galaxies is 

to! 

So the probability that the given region is devoid of galaxies is Po = e~ NV . It follows that the probability 
density function for the continuous random variable V in the Poisson case is 

Prandom(V) =Ne~ WV (2) 

In practice of course, measurements will depend on algorithms that specify threshold values for density ranges 
of galaxies in cells and the lowest range will represent the 'underdense' regions which include the voids; Benson 
et al. 3 discuss this. 

A hierachy of iV-point correlation functions needed to represent clustering of galaxies in a complete sense was 
devised by White 12 lj and he provided explicit formulae, including their continuous limit. In particular, he 
made a detailed study of the probability that a sphere of radius R is empty and showed that formally it is 
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symmetrically dependent on the whole hierarchy of correlation functions. However, White concentrated his 
applications on the case when the underlying galaxy distribution was a Poisson process, the starting point for 
the present approach which is concerned with geometrizing the parameter space of departures from a Poisson 
process. Croton et al. [o] found that the negative binomial model for galaxy clustering gave a very good 
approximation to the 2dFGRS, pointing out that this model is a discrete version of the gamma distribution. 



2 Modelling statistics of galaxy void sizes 

For a general account of large-scale structures in the universe, see Fairall ^2]. Kauffmann and Fairall [TB] 
developed a catalogue search algorithm for larger nearly spherical regions devoid of bright galaxies and obtained 
a spectrum for radii of significant voids. This indicated a peak radius near 4 h~ 1 Mpc, a long tail stretching 
at least to 32 h~ l Mpc 1 and is compatible with the recent extrapolation models of Baccigalupi et al 2 which 
yield an upper bound on void radii of about 50 h~ l Mpc. This data has of course omitted the expected very 
large numerical contribution of smaller voids. More recent work, notably of Croton et al. [B] provide much 
larger samples with improved estimates of void size statistics and Benson et al. gave a theoretical analysis 
in anticipation of the 2dFGRS survey data, including the evaluation of the void and underdense probability 
functions. Hoyle and Vogeley provided detailed results for the statistics of voids larger than 10 h~ 1 Mpc in 
the 2dFGRS survey data; they concluded that such voids constitute some 40% of the universe and have a mean 
radius of about 15 h~ 1 Mpc. 

The count density N(V) of galaxies observed in zones using a range of sampling schemes each with a fixed zone 
volume V results in a decreasing variance Var(N(V)) of count density with increasing zone size, roughly of the 
form 

Var(N(V)) pa F(0)e -v/v * as V -► (3) 

where Vk is some characteristic scaling parameter. This monotonic decay of variance with zone size is a natural 
consequence of the monotonic decay of the covariance function, roughly isotropically and of the form 

Cov{r) Pa e -' r/rk as r -► (4) 

where r% is some characteristic scaling parameter of the order of magnitude of the diameter of filament structures; 
this was discussed in jHJ. Then 

/•OO 

Var{N(V)) pa / Cov{r)b(r)dr (5) 
Jo 

where b(r) is the probability density of finding two points separated by distance r independently and at random 
in a zone of volume V. The power spectrum using, say, cubical cells of side lengths R is given by the family of 
integrals 

/>oo 

Pow{N(R)) w / Cov(r)b(r)dr. (6) 



R 
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Fairall ^2 (page 124) reported a value a 2 = 0.25 for the ratio of variance Var(N(l)) to mean squared N for 
counts of galaxies in cubical cells of unit side length. In other words, the coefficient of variation for sampling 
with cells of unit volume is 

ro ( W(1 )) = «SS = . 5 m 

and this is dimensionless. 

We choose a family of parametric statistical models for void volumes that includes the random model J2J as 
a special case. There are of course many such families, but we take one that a recent theorem 1] has shown 
contains neighbourhoods of all departures from randomness and it has been successful in modelling void size 
distributions in terrestrial stochastic porous media with similar departures from randomness Also, the 

complementary logarithmic version has been used in the representation of clustering of galaxies |§] • The 
family of gamma distributions has event space Q = R + , parameters /i, (3 € M + and probability density functions 
given by 

'<"»-»- (s)'tw'- v '" (8) 
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Then V — [i and Var(V) = fj 2 / (3 and we see that n controls the mean of the distribution while the spread and 
shape is controlled by, 1/(3, the square of the coefficient of variation. 

The special case (3 = 1 corresponds to the situation when V represents the random or Poisson process in (J2J 
with [i — l/n. Thus, the family of gamma distributions can model a range of stochastic processes corresponding 
to non-independent 'clumped' events, for (3 < 1, and dispersed events, for > 1, as well as the random case 
(cf. [Tl 1101 ITT] ). Thus, if we think of this range of processes as corresponding to the possible distributions of 
centroids of extended objects such as galaxies that are initially distributed according to a Poisson process with 
j3 = 1, then the three possibilities are: 

Chaotic or random structure with no interactions among constituents, (3=1; 
Clustered structure arising from mutually attractive type interactions, (3 < 1; 
Dispersed structure arising from mutually repulsive type interactions, (3 > 1. 

For our gamma-based void model we consider the radius R of a spherical void with volume V = | irR 3 having 
distribution (jSJ) . Then the probability density function for R is given by 

4nR 2 f(3\ P fAnR^ - 1 



= ^(^1 [—) (9) 

The mean i?, variance Var(R) and coefficient of variation cv(R) of R are given, respectively, by 



R = U§?J -Tpf (10) 



Var(K) \A-kL3) T(f3) 2 



(11) 



v^CR) _ /r(/3)r (/? + §) 



cv(R) = ^-1^ = V V(A|) 1 (12) 

The fact that the coefficient of variation (|12|) depends only on (3 gives a rapid parameter fitting of data to the 
probability density function for void radii ©• Numerical fitting to (|12J) gives (3; this substituted in (|10(l yields 
an estimate of fi to fit a given observational mean. 

However, there is a complication: necessarily in order to have a physically meaningful definition for voids, 
observational measurements introduce a minimum threshold size for voids. For example, Hoyle and Vogeley |13| 
used algorithms to obtain statistics on 2dFGRS voids with radius R > R m in = 10 h~ 1 Mpc] for voids above this 
threshold they found their mean size is about 15 h~ 1 Mpc with a variance of about 8.1. This of course is not 
directly comparable with the above distribution for R in equation 10 since the latter has domain R > 0. Now, 
from ©, the probability that a void has radius R > A is 

and hence the mean, variance and coefficient of variation for the void distribution with R > A become: 

R>A - IwJ (14) 
Far(iW) = U*J im^w) (15) 



cv(R >A ) = 

where 



y/Var(R >A ) 



R 



>A 



T((3) T(p + l^) 



2 



(16) 



/"OO 

T(/3, A) = l t 13 ^ 1 e~* (it is the incomplete gamma function with T(/3) = T(/3, 0). 
J A 

Summarizing from Hoyle and Vogeley |13| : 

A = 10 hr l Mpc, P A « 0.4,^w 15 h^Mpc, Var(R >A ) ~ 8.1 (h^Mpc) 2 so cw(i? >A ) w 0.19. 
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Figure 1: Probability that a void will have radius R > 10 h~ 1 Mpc as a function of parameters /i, (3 from equation 
\13\) . The range (3 < 1 corresponds to clustering regimes. The plane at level Pr>io = 0.4 corresponds to the 
fraction 40% of the universe filled by voids, as reported by Hoyle and Vogeley IIS - 

3 Model coupling of clustering densities and voids 

Next we follow the methodology introduced in [HI El to provide a model that links the number counts in cells and 
the void probability function and which contains perturbations of the random case. This exploits the central role 
of the gamma distribution in providing neighbourhoods of randomness [Q that contain all maximum likelihood 
nearby perturbations of the random case and it allows a direct use of the linked information geometries for the 
coupled stochastic processes of voids and galaxies. Clearly, in regions where the local void volume V tends to 
be small the local matter density N will tend to be large. Since the matter density must be bounded, then a 
simple phenomenological model that couples the two random variables in the stochastic process is N(V) = e~ v , 
where the upper bound on N has been set to unity. This model was explored in [HI Ej and it is easy to show 
that the probability density function for N is given by the log gamma distribution 

g(N^,f3)=Uj 1 j ^-llogAf"' 3 (17) 

This distribution for local galactic number density has mean N, variance Var(N) and coefficient of variation 
cv(N) = yJVar{N)/~N given by 

N = (th-tY (18) 



p + n 



VariN) ^ [Jfr,) ~ [fa ^ 



26 




c „ (w) = « . JfVU^MU'-i- (20) 



N V V/3 + /V V/ 3 + 2 M 



Using the reported value cv(N(l)) = 0.5 from Fairall 12 (page 124) for cubical volumes with side length 
R = 1 h~ 1 Mpc, the curve so defined in the parameter space for the log gamma distributions (fT7|l . has maximum 
clustering for (3 w 0.6, fi w 0.72. 
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Figure 2: Coefficient of variation of counts in cells N for log gamma distribution eguation TFw . The range 
(3 < 1 corresponds to clustering regimes. The three planes show the levels cv(N) = 1, \/6, \/10 as reported by 
Fairall \T%$ and Croton et al. EJ. 

From the 2-degree field Galaxy Redshift Survey (2dFGRS), Croton et al. jS] in Figure 2 reported the decay of 
normalised variance, £2 = cv{N (R)) 2 with scale radius R and the associated departure from randomness in the 
form x = — los ^>_£o(R) } where Pq (R) is the probability of finding zero galaxies in a spherical region of radius R 
when the mean number is N. From that Figure 2 we see that, for the data of the Volume Limited Catalogue 
with magnitude range -20 to -21 and N(l) = 1.46 : cv(N{l)) 2 » 6 and \ ~ 0.9 at R w 1 also cv(N(7)) 2 sa 1 
and x ~ 0.4 at R sa 7. 

Croton et al. §3 m Table 1 reported TV values for cubical volumes with side length R = 1 h~ 1 Mpc in the range 
0.11 < N < 11. From Figure 3 in that paper we see that, at the scale R = 1 h~ 1 Mpc, log 10 £ 2 ~ 1 which gives 
a coefficient of variation cu(iV(l)) w \/T0. 

The above-mentioned observations cv(N) = 1, y 7 ^, >/l0 are shown in Figure [21 on a plot of the coefficient of 
variation for the number counts in cells from the log gamma family of distributions equation (|17f) . The range 
(3 < 1 corresponds to clustering regimes. 

Theoretical models for the evolution of galactic clustering through an evolving stochastic process subordinate 
to the log gamma distribution (|17|) of densities could be represented as curves on the space of parameters with 
the metric (|25ll interpreting the parameter changes with time in the appropriate way. The coupling with the 
void probability function controlled by the gamma distribution @ allows the corresponding void evolution 
to be represented, ft is of course very unlikely that this simple model is suitable in all respects but given a 
different family of distributions the necessary information geometry can be computed for the representation of 
evolutionary processes. 



4 Information geometry of gamma models for void volume statistics 

For stochastic processes subordinate to a given family of parametric statistical models, Shannon's information 
theoretic 'entropy' or 'uncertainty' (cf. eg. Jaynes |14| ~) is given, up to a factor, by the negative of the expectation 
of the logarithm of the probability density function. For the family of models we propose for void volumes (JHJ 
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this entropy is, 

/>oo 

S f (ji,/3) = -J ]og(f(y;n,P)f(y;n,0)dV (21) 

In particular, at unit mean, the maximum entropy (or maximum uncertainty) occurs at \3 = 1, which is the 
random case, and then Sf(u, 1) = 1 + log/i. 

The 'maximum likelihood' estimates (I, (3 of [i, \3 can be expressed in terms of the mean and mean logarithm of a 
set of independent observations X — {X\, X 2 , . . . , X n }. These estimates are obtained in terms of the properties 
of X by maximizing the 'log-likelihood' function 

lx(v,P) = loglik x (^,f3) = log ^J|p(A,; n, 0)j 

with the following result 

1 - 

A = X = - V Xi (23) 



log/3-V(/3) = log X- log X (24) 

where log A = ^ X^iLi l°g^s an< i "0(/?) = TJjZy ^ s ^he digamma function, the logarithmic derivative of the 
gamma function 

The usual Riemannian information metric on the 2-dimensional parameter space S = {(/i,/3) G M + x M + } is 
given by 

dsl = 4 rf/i 2 + I ^'09) - 4 I d(3 2 for R + . (25) 

The important point about this non-Euclidean metric on the space of parameters is that it derives from log- 
likelihood properties and so it is the 'correct' one for this family of distributions. Given a different family 
then the information geometry can be computed for that; the geometries are known also for bivariate gamma 
distributions, Gaussian and multivariate Gaussian distributions among others. For more details about the 
geometry see [TU] . 

The 1-dimensional subspace parametrized by j3 = 1 corresponds to the available 'random' processes. A path 
through the parameter space S of gamma models determines a curve 

c:[a,b}^S:t^{ Cl (t),c 2 (t)) (26) 

with tangent vector c(t) = (ci(i), <>2.{t)) and norm ||c|| given via l|25|l by 

\\m\? = ^h{tf+U\c 2 {t))--^j t 2 {tf. (27) 

The information length of the curve is 

L c (a,b) = { \\c(t)\\dt (28) 



and the curve corresponding to an underlying Poisson process has c(t) = (t, 1), so t = [i and /3 = 1 = constant, 
and the information length is log - . 
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